-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rustdoc-search: tighter encoding for f index #119468
Conversation
(rustbot has picked a reviewer for you, use r? to override) |
Some changes occurred in HTML/CSS/JS. cc @GuillaumeGomez, @jsha |
This comment has been minimized.
This comment has been minimized.
e2df785
to
b2269f4
Compare
This comment has been minimized.
This comment has been minimized.
Two optimizations for the function signature search: * Instead of using JSON arrays, like `[1,20]`, it uses VLQ hex with no commas, like `[aAd]`. * This also adds backrefs: if you have more than one function with exactly the same signature, it'll not only store it once, it'll *decode* it once, and store in the typeIdMap only once. Size change ----------- standard library ```console $ du -bs search-index-old.js search-index-new.js 4976370 search-index-old.js 4404391 search-index-new.js ``` ((4976370-4404391)/4404391)*100% = 12.9% Benchmarks are similarly shrunk: ```console $ du -hs tmp/{arti,cortex-m,sqlx,stm32f4,ripgrep}/toolchain_{old,new}/doc/search-index.js 10555067 tmp/arti/toolchain_old/doc/search-index.js 8921236 tmp/arti/toolchain_new/doc/search-index.js 77018 tmp/cortex-m/toolchain_old/doc/search-index.js 66676 tmp/cortex-m/toolchain_new/doc/search-index.js 2876330 tmp/sqlx/toolchain_old/doc/search-index.js 2436812 tmp/sqlx/toolchain_new/doc/search-index.js 63632890 tmp/stm32f4/toolchain_old/doc/search-index.js 52337438 tmp/stm32f4/toolchain_new/doc/search-index.js 631150 tmp/ripgrep/toolchain_old/doc/search-index.js 541646 tmp/ripgrep/toolchain_new/doc/search-index.js ```
b2269f4
to
86b9550
Compare
Can you post the difference for the search-index JS execution as well please? Might be nice to add some explanations about what algorithm is used so it's easier to understand the code too. |
src/librustdoc/html/render/mod.rs
Outdated
id.serialize(serializer) | ||
// zig-zag notation | ||
let value: u32 = (id << 1) | (if sign { 1 } else { 0 }); | ||
// encode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you precise which encoding it is please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
I've opened another PR, rust-lang/rustc-dev-guide#1846, which documents this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that's super nice! Could you also precise the encoding used here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I've added some comments describing the format.
This comment has been minimized.
This comment has been minimized.
69d86c9
to
448b19b
Compare
448b19b
to
a68ac32
Compare
Thanks! r=me when CI pass |
This comment has been minimized.
This comment has been minimized.
e53a65d
to
004bfc5
Compare
@bors r=GuillaumeGomez rollup |
…=GuillaumeGomez rustdoc-search: tighter encoding for f index Depends on rust-lang#119457 Two optimizations for the function signature search: * Instead of using JSON arrays, like `[1,20]`, it uses VLQ hex with no commas, like `[aAd]`. * This also adds backrefs: if you have more than one function with exactly the same signature, it'll not only store it once, it'll *decode* it once, and store in the typeIdMap only once. Based partially on discussions on zulip: https://rust-lang.zulipchat.com/#narrow/stream/266220-t-rustdoc/topic/search.20index.20size Performance ----------- https://notriddle.com/rustdoc-html-demo-8/compression-perf-v2/index.html ### memory/time profiler output (for more details, consult the above link) <table> <thead><tr><th>benchmark<th>before<th>after</tr></thead> <tbody> <tr><th>arti<td> ``` user: 002.789 s sys: 000.390 s wall: 002.096 s child_RSS_high: 440796 KiB group_mem_high: 414924 KiB ``` </td><td> ``` user: 002.295 s sys: 000.278 s wall: 001.738 s child_RSS_high: 314588 KiB group_mem_high: 285220 KiB ``` </td></tr><tr><th>cortex-m<td> ``` user: 000.127 s sys: 000.030 s wall: 000.134 s child_RSS_high: 60264 KiB group_mem_high: 23824 KiB ``` </td><td> ``` user: 000.136 s sys: 000.038 s wall: 000.137 s child_RSS_high: 59204 KiB group_mem_high: 22712 KiB ``` </td></tr><tr><th>sqlx<td> ``` user: 000.887 s sys: 000.118 s wall: 000.592 s child_RSS_high: 190408 KiB group_mem_high: 157804 KiB ``` </td><td> ``` user: 000.798 s sys: 000.101 s wall: 000.525 s child_RSS_high: 159292 KiB group_mem_high: 126292 KiB ``` </td></tr><tr><th>stm32f4<td> ``` user: 013.884 s sys: 005.399 s wall: 013.149 s child_RSS_high: 1942244 KiB group_mem_high: 1954916 KiB ``` </td><td> ``` user: 006.128 s sys: 003.297 s wall: 007.994 s child_RSS_high: 1038108 KiB group_mem_high: 1023900 KiB ``` </td></tr><tr><th>ripgrep<td> ``` user: 000.441 s sys: 000.063 s wall: 000.264 s child_RSS_high: 109180 KiB group_mem_high: 74272 KiB ``` </td><td> ``` user: 000.408 s sys: 000.044 s wall: 000.238 s child_RSS_high: 101488 KiB group_mem_high: 66000 KiB ``` </td></tr></tbody></table> Size change ----------- standard library without gzip: ```console $ du -bs search-index-old.js search-index-new.js 4976370 search-index-old.js 4404391 search-index-new.js ``` ((4976370-4404391)/4404391)*100% = 12.9% with gzip: ```console $ du -hs search-index-old.js.gz search-index-new.js.gz 520K search-index-old.js.gz 504K search-index-new.js.gz $ du -bs search-index-old.js.gz search-index-new.js.gz 522092 search-index-old.js.gz 507654 search-index-new.js.gz ``` ((522092-507654)/507654)*100% = 2.8% Benchmarks are similarly shrunk. Without gzip: ```console $ du -hs tmp/{arti,cortex-m,sqlx,stm32f4,ripgrep}/toolchain_{old,new}/doc/search-index.js 10555067 tmp/arti/toolchain_old/doc/search-index.js 8921236 tmp/arti/toolchain_new/doc/search-index.js 77018 tmp/cortex-m/toolchain_old/doc/search-index.js 66676 tmp/cortex-m/toolchain_new/doc/search-index.js 2876330 tmp/sqlx/toolchain_old/doc/search-index.js 2436812 tmp/sqlx/toolchain_new/doc/search-index.js 63632890 tmp/stm32f4/toolchain_old/doc/search-index.js 52337438 tmp/stm32f4/toolchain_new/doc/search-index.js 631150 tmp/ripgrep/toolchain_old/doc/search-index.js 541646 tmp/ripgrep/toolchain_new/doc/search-index.js ``` With gzip: ```console $ du -bs tmp/{arti,cortex-m,sqlx,stm32f4,ripgrep}/toolchain_{old,new}/doc/search-index.js.gz 1618852 tmp/arti/toolchain_old/doc/search-index.js.gz 1582007 tmp/arti/toolchain_new/doc/search-index.js.gz 16109 tmp/cortex-m/toolchain_old/doc/search-index.js.gz 15831 tmp/cortex-m/toolchain_new/doc/search-index.js.gz 422257 tmp/sqlx/toolchain_old/doc/search-index.js.gz 411507 tmp/sqlx/toolchain_new/doc/search-index.js.gz 4454761 tmp/stm32f4/toolchain_old/doc/search-index.js.gz 4334924 tmp/stm32f4/toolchain_new/doc/search-index.js.gz 98312 tmp/ripgrep/toolchain_old/doc/search-index.js.gz 96864 tmp/ripgrep/toolchain_new/doc/search-index.js.gz $ du -hs tmp/{arti,cortex-m,sqlx,stm32f4,ripgrep}/toolchain_{old,new}/doc/search-index.j s.gz 1.6M tmp/arti/toolchain_old/doc/search-index.js.gz 1.6M tmp/arti/toolchain_new/doc/search-index.js.gz 24K tmp/cortex-m/toolchain_old/doc/search-index.js.gz 24K tmp/cortex-m/toolchain_new/doc/search-index.js.gz 424K tmp/sqlx/toolchain_old/doc/search-index.js.gz 412K tmp/sqlx/toolchain_new/doc/search-index.js.gz 4.3M tmp/stm32f4/toolchain_old/doc/search-index.js.gz 4.2M tmp/stm32f4/toolchain_new/doc/search-index.js.gz 108K tmp/ripgrep/toolchain_old/doc/search-index.js.gz 104K tmp/ripgrep/toolchain_new/doc/search-index.js.gz ```
…mpiler-errors Rollup of 9 pull requests Successful merges: - rust-lang#119208 (coverage: Hoist some complex code out of the main span refinement loop) - rust-lang#119216 (Use diagnostic namespace in stdlib) - rust-lang#119414 (bootstrap: Move -Clto= setting from Rustc::run to rustc_cargo) - rust-lang#119420 (Handle ForeignItem as TAIT scope.) - rust-lang#119468 (rustdoc-search: tighter encoding for f index) - rust-lang#119628 (remove duplicate test) - rust-lang#119638 (fix cyle error when suggesting to use associated function instead of constructor) - rust-lang#119640 (library: Fix warnings in rtstartup) - rust-lang#119642 (library: Fix a symlink test failing on Windows) r? `@ghost` `@rustbot` modify labels: rollup
Rollup merge of rust-lang#119468 - notriddle:notriddle/compression, r=GuillaumeGomez rustdoc-search: tighter encoding for f index Depends on rust-lang#119457 Two optimizations for the function signature search: * Instead of using JSON arrays, like `[1,20]`, it uses VLQ hex with no commas, like `[aAd]`. * This also adds backrefs: if you have more than one function with exactly the same signature, it'll not only store it once, it'll *decode* it once, and store in the typeIdMap only once. Based partially on discussions on zulip: https://rust-lang.zulipchat.com/#narrow/stream/266220-t-rustdoc/topic/search.20index.20size Performance ----------- https://notriddle.com/rustdoc-html-demo-8/compression-perf-v2/index.html ### memory/time profiler output (for more details, consult the above link) <table> <thead><tr><th>benchmark<th>before<th>after</tr></thead> <tbody> <tr><th>arti<td> ``` user: 002.789 s sys: 000.390 s wall: 002.096 s child_RSS_high: 440796 KiB group_mem_high: 414924 KiB ``` </td><td> ``` user: 002.295 s sys: 000.278 s wall: 001.738 s child_RSS_high: 314588 KiB group_mem_high: 285220 KiB ``` </td></tr><tr><th>cortex-m<td> ``` user: 000.127 s sys: 000.030 s wall: 000.134 s child_RSS_high: 60264 KiB group_mem_high: 23824 KiB ``` </td><td> ``` user: 000.136 s sys: 000.038 s wall: 000.137 s child_RSS_high: 59204 KiB group_mem_high: 22712 KiB ``` </td></tr><tr><th>sqlx<td> ``` user: 000.887 s sys: 000.118 s wall: 000.592 s child_RSS_high: 190408 KiB group_mem_high: 157804 KiB ``` </td><td> ``` user: 000.798 s sys: 000.101 s wall: 000.525 s child_RSS_high: 159292 KiB group_mem_high: 126292 KiB ``` </td></tr><tr><th>stm32f4<td> ``` user: 013.884 s sys: 005.399 s wall: 013.149 s child_RSS_high: 1942244 KiB group_mem_high: 1954916 KiB ``` </td><td> ``` user: 006.128 s sys: 003.297 s wall: 007.994 s child_RSS_high: 1038108 KiB group_mem_high: 1023900 KiB ``` </td></tr><tr><th>ripgrep<td> ``` user: 000.441 s sys: 000.063 s wall: 000.264 s child_RSS_high: 109180 KiB group_mem_high: 74272 KiB ``` </td><td> ``` user: 000.408 s sys: 000.044 s wall: 000.238 s child_RSS_high: 101488 KiB group_mem_high: 66000 KiB ``` </td></tr></tbody></table> Size change ----------- standard library without gzip: ```console $ du -bs search-index-old.js search-index-new.js 4976370 search-index-old.js 4404391 search-index-new.js ``` ((4976370-4404391)/4404391)*100% = 12.9% with gzip: ```console $ du -hs search-index-old.js.gz search-index-new.js.gz 520K search-index-old.js.gz 504K search-index-new.js.gz $ du -bs search-index-old.js.gz search-index-new.js.gz 522092 search-index-old.js.gz 507654 search-index-new.js.gz ``` ((522092-507654)/507654)*100% = 2.8% Benchmarks are similarly shrunk. Without gzip: ```console $ du -hs tmp/{arti,cortex-m,sqlx,stm32f4,ripgrep}/toolchain_{old,new}/doc/search-index.js 10555067 tmp/arti/toolchain_old/doc/search-index.js 8921236 tmp/arti/toolchain_new/doc/search-index.js 77018 tmp/cortex-m/toolchain_old/doc/search-index.js 66676 tmp/cortex-m/toolchain_new/doc/search-index.js 2876330 tmp/sqlx/toolchain_old/doc/search-index.js 2436812 tmp/sqlx/toolchain_new/doc/search-index.js 63632890 tmp/stm32f4/toolchain_old/doc/search-index.js 52337438 tmp/stm32f4/toolchain_new/doc/search-index.js 631150 tmp/ripgrep/toolchain_old/doc/search-index.js 541646 tmp/ripgrep/toolchain_new/doc/search-index.js ``` With gzip: ```console $ du -bs tmp/{arti,cortex-m,sqlx,stm32f4,ripgrep}/toolchain_{old,new}/doc/search-index.js.gz 1618852 tmp/arti/toolchain_old/doc/search-index.js.gz 1582007 tmp/arti/toolchain_new/doc/search-index.js.gz 16109 tmp/cortex-m/toolchain_old/doc/search-index.js.gz 15831 tmp/cortex-m/toolchain_new/doc/search-index.js.gz 422257 tmp/sqlx/toolchain_old/doc/search-index.js.gz 411507 tmp/sqlx/toolchain_new/doc/search-index.js.gz 4454761 tmp/stm32f4/toolchain_old/doc/search-index.js.gz 4334924 tmp/stm32f4/toolchain_new/doc/search-index.js.gz 98312 tmp/ripgrep/toolchain_old/doc/search-index.js.gz 96864 tmp/ripgrep/toolchain_new/doc/search-index.js.gz $ du -hs tmp/{arti,cortex-m,sqlx,stm32f4,ripgrep}/toolchain_{old,new}/doc/search-index.j s.gz 1.6M tmp/arti/toolchain_old/doc/search-index.js.gz 1.6M tmp/arti/toolchain_new/doc/search-index.js.gz 24K tmp/cortex-m/toolchain_old/doc/search-index.js.gz 24K tmp/cortex-m/toolchain_new/doc/search-index.js.gz 424K tmp/sqlx/toolchain_old/doc/search-index.js.gz 412K tmp/sqlx/toolchain_new/doc/search-index.js.gz 4.3M tmp/stm32f4/toolchain_old/doc/search-index.js.gz 4.2M tmp/stm32f4/toolchain_new/doc/search-index.js.gz 108K tmp/ripgrep/toolchain_old/doc/search-index.js.gz 104K tmp/ripgrep/toolchain_new/doc/search-index.js.gz ```
Depends on #119457
Two optimizations for the function signature search:
[1,20]
, it uses VLQhex with no commas, like
[aAd]
.with exactly the same signature, it'll not only store it once,
it'll decode it once, and store in the typeIdMap only once.
Based partially on discussions on zulip:
https://rust-lang.zulipchat.com/#narrow/stream/266220-t-rustdoc/topic/search.20index.20size
Performance
https://notriddle.com/rustdoc-html-demo-8/compression-perf-v2/index.html
memory/time profiler output (for more details, consult the above link)
Size change
standard library without gzip:
((4976370-4404391)/4404391)*100% = 12.9%
with gzip:
((522092-507654)/507654)*100% = 2.8%
Benchmarks are similarly shrunk.
Without gzip:
With gzip: